Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 1048575 |
| Missing cells | 5898129 |
| Missing cells (%) | 56.2% |
| Duplicate rows | 30893 |
| Duplicate rows (%) | 2.9% |
| Total size in memory | 80.0 MiB |
| Average record size in memory | 80.0 B |
Variable types
| Categorical | 2 |
|---|---|
| Text | 8 |
Carrier has constant value "BSNL MOBILE" | Constant |
| Dataset has 30893 (2.9%) duplicate rows | Duplicates |
Name has 60334 (5.8%) missing values | Missing |
Gender has 1003236 (95.7%) missing values | Missing |
JobTitle has 1022814 (97.5%) missing values | Missing |
CompanyName has 1030447 (98.3%) missing values | Missing |
Email has 712816 (68.0%) missing values | Missing |
Facebook has 1027992 (98.0%) missing values | Missing |
Twitter has 1037773 (99.0%) missing values | Missing |
Reproduction
| Analysis started | 2024-07-17 22:08:08.560670 |
|---|---|
| Analysis finished | 2024-07-17 22:08:47.173003 |
| Duration | 38.61 seconds |
| Software version | ydata-profiling vv4.9.0 |
| Download configuration | config.json |
Number
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.0 MiB |
| 919000000000.0 | |
|---|---|
| 918000000000.0 | |
| 917000000000.0 |
Length
| Max length | 14 |
|---|---|
| Median length | 14 |
| Mean length | 14 |
| Min length | 14 |
Characters and Unicode
| Total characters | 14680050 |
|---|---|
| Distinct characters | 6 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 917000000000.0 |
|---|---|
| 2nd row | 917000000000.0 |
| 3rd row | 917000000000.0 |
| 4th row | 917000000000.0 |
| 5th row | 917000000000.0 |
Common Values
| Value | Count | Frequency (%) |
| 919000000000.0 | 820594 | |
| 918000000000.0 | 124814 | 11.9% |
| 917000000000.0 | 103167 | 9.8% |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| 919000000000.0 | 820594 | |
| 918000000000.0 | 124814 | 11.9% |
| 917000000000.0 | 103167 | 9.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 10485750 | |
| 9 | 1869169 | 12.7% |
| 1 | 1048575 | 7.1% |
| . | 1048575 | 7.1% |
| 8 | 124814 | 0.9% |
| 7 | 103167 | 0.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 13631475 | |
| Other Punctuation | 1048575 | 7.1% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 10485750 | |
| 9 | 1869169 | 13.7% |
| 1 | 1048575 | 7.7% |
| 8 | 124814 | 0.9% |
| 7 | 103167 | 0.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 1048575 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 14680050 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 10485750 | |
| 9 | 1869169 | 12.7% |
| 1 | 1048575 | 7.1% |
| . | 1048575 | 7.1% |
| 8 | 124814 | 0.9% |
| 7 | 103167 | 0.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 14680050 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 10485750 | |
| 9 | 1869169 | 12.7% |
| 1 | 1048575 | 7.1% |
| . | 1048575 | 7.1% |
| 8 | 124814 | 0.9% |
| 7 | 103167 | 0.7% |
Carrier
Categorical
CONSTANT 
| Distinct | 1 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 8.0 MiB |
| BSNL MOBILE |
|---|
Length
| Max length | 11 |
|---|---|
| Median length | 11 |
| Mean length | 11 |
| Min length | 11 |
Characters and Unicode
| Total characters | 11534325 |
|---|---|
| Distinct characters | 9 |
| Distinct categories | 2 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | BSNL MOBILE |
|---|---|
| 2nd row | BSNL MOBILE |
| 3rd row | BSNL MOBILE |
| 4th row | BSNL MOBILE |
| 5th row | BSNL MOBILE |
Common Values
| Value | Count | Frequency (%) |
| BSNL MOBILE | 1048575 |
Length
Histogram of lengths of the category
Common Values (Plot)
| Value | Count | Frequency (%) |
| bsnl | 1048575 | |
| mobile | 1048575 |
Most occurring characters
| Value | Count | Frequency (%) |
| B | 2097150 | |
| L | 2097150 | |
| S | 1048575 | |
| N | 1048575 | |
| 1048575 | ||
| M | 1048575 | |
| O | 1048575 | |
| I | 1048575 | |
| E | 1048575 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 10485750 | |
| Space Separator | 1048575 | 9.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 2097150 | |
| L | 2097150 | |
| S | 1048575 | |
| N | 1048575 | |
| M | 1048575 | |
| O | 1048575 | |
| I | 1048575 | |
| E | 1048575 |
Space Separator
| Value | Count | Frequency (%) |
| 1048575 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 10485750 | |
| Common | 1048575 | 9.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| B | 2097150 | |
| L | 2097150 | |
| S | 1048575 | |
| N | 1048575 | |
| M | 1048575 | |
| O | 1048575 | |
| I | 1048575 | |
| E | 1048575 |
Common
| Value | Count | Frequency (%) |
| 1048575 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11534325 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| B | 2097150 | |
| L | 2097150 | |
| S | 1048575 | |
| N | 1048575 | |
| 1048575 | ||
| M | 1048575 | |
| O | 1048575 | |
| I | 1048575 | |
| E | 1048575 |
Name
Text
MISSING 
| Distinct | 716717 |
|---|---|
| Distinct (%) | 72.5% |
| Missing | 60334 |
| Missing (%) | 5.8% |
| Memory size | 8.0 MiB |
Length
| Max length | 1623 |
|---|---|
| Median length | 161 |
| Mean length | 13.176837 |
| Min length | 1 |
Characters and Unicode
| Total characters | 13021891 |
|---|---|
| Distinct characters | 197 |
| Distinct categories | 17 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 664753 ? |
|---|---|
| Unique (%) | 67.3% |
Sample
| 1st row | Jikku Ayush Kids |
|---|---|
| 2nd row | Goswami Ritu |
| 3rd row | Sruthi |
| 4th row | Arshiya Kadiri |
| 5th row | Singampalli Naresh |
| Value | Count | Frequency (%) |
| reddy | 45390 | 2.2% |
| kumar | 27207 | 1.3% |
| rao | 25770 | 1.2% |
| k | 19798 | 0.9% |
| s | 16036 | 0.8% |
| m | 15343 | 0.7% |
| krishna | 14050 | 0.7% |
| p | 13866 | 0.7% |
| raju | 13332 | 0.6% |
| prasad | 12706 | 0.6% |
| Other values (230060) | 1891934 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 2210093 | |
| 1107756 | 8.5% | |
| i | 766624 | 5.9% |
| r | 731407 | 5.6% |
| n | 683385 | 5.2% |
| h | 615590 | 4.7% |
| e | 550117 | 4.2% |
| u | 501327 | 3.8% |
| d | 452484 | 3.5% |
| s | 377404 | 2.9% |
| Other values (187) | 5025704 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 9586887 | |
| Uppercase Letter | 2079253 | 16.0% |
| Space Separator | 1108128 | 8.5% |
| Other Punctuation | 108029 | 0.8% |
| Decimal Number | 78261 | 0.6% |
| Other Symbol | 34473 | 0.3% |
| Math Symbol | 11205 | 0.1% |
| Dash Punctuation | 6535 | 0.1% |
| Open Punctuation | 2921 | < 0.1% |
| Currency Symbol | 2359 | < 0.1% |
| Other values (7) | 3840 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 334873 | |
| R | 245314 | |
| K | 170898 | 8.2% |
| M | 157101 | 7.6% |
| P | 145199 | 7.0% |
| A | 131141 | 6.3% |
| B | 119840 | 5.8% |
| V | 114483 | 5.5% |
| N | 103135 | 5.0% |
| G | 85696 | 4.1% |
| Other values (56) | 471573 |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 2210093 | |
| i | 766624 | 8.0% |
| r | 731407 | 7.6% |
| n | 683385 | 7.1% |
| h | 615590 | 6.4% |
| e | 550117 | 5.7% |
| u | 501327 | 5.2% |
| d | 452484 | 4.7% |
| s | 377404 | 3.9% |
| m | 361429 | 3.8% |
| Other values (47) | 2337027 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 92224 | |
| @ | 2075 | 1.9% |
| • | 1681 | 1.6% |
| ' | 1474 | 1.4% |
| § | 1164 | 1.1% |
| ‡ | 1109 | 1.0% |
| † | 1099 | 1.0% |
| * | 1082 | 1.0% |
| & | 1070 | 1.0% |
| … | 1024 | 0.9% |
| Other values (11) | 4027 | 3.7% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 18533 | |
| 1 | 14094 | |
| 0 | 7814 | |
| 3 | 6920 | 8.8% |
| 4 | 6744 | 8.6% |
| 9 | 6016 | 7.7% |
| 8 | 4917 | 6.3% |
| 5 | 4914 | 6.3% |
| 7 | 4551 | 5.8% |
| 6 | 3758 | 4.8% |
Math Symbol
| Value | Count | Frequency (%) |
| ± | 9873 | |
| ¬ | 1058 | 9.4% |
| | | 111 | 1.0% |
| + | 88 | 0.8% |
| ~ | 48 | 0.4% |
| = | 11 | 0.1% |
| < | 10 | 0.1% |
| > | 6 | 0.1% |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 29772 | |
| ® | 2143 | 6.2% |
| ¦ | 1052 | 3.1% |
| № | 672 | 1.9% |
| © | 566 | 1.6% |
| ™ | 268 | 0.8% |
Open Punctuation
| Value | Count | Frequency (%) |
| ‚ | 1819 | |
| ( | 556 | 19.0% |
| „ | 502 | 17.2% |
| { | 32 | 1.1% |
| [ | 12 | 0.4% |
Initial Punctuation
| Value | Count | Frequency (%) |
| ‹ | 515 | |
| ‘ | 214 | |
| « | 208 | |
| “ | 107 | 10.2% |
Final Punctuation
| Value | Count | Frequency (%) |
| » | 188 | |
| ’ | 137 | |
| › | 101 | |
| ” | 100 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 5588 | |
| — | 808 | 12.4% |
| – | 139 | 2.1% |
Currency Symbol
| Value | Count | Frequency (%) |
| ¤ | 1334 | |
| $ | 537 | |
| € | 488 | 20.7% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 613 | |
| } | 34 | 5.1% |
| ] | 14 | 2.1% |
Space Separator
| Value | Count | Frequency (%) |
| 1107756 | ||
| 372 | < 0.1% |
Modifier Symbol
| Value | Count | Frequency (%) |
| ^ | 37 | |
| ` | 16 |
Control
| Value | Count | Frequency (%) |
| | 625 |
Format
| Value | Count | Frequency (%) |
| | 539 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 392 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 11578019 | |
| Common | 1357891 | 10.4% |
| Cyrillic | 85981 | 0.7% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1107756 | ||
| . | 92224 | 6.8% |
| ° | 29772 | 2.2% |
| 2 | 18533 | 1.4% |
| 1 | 14094 | 1.0% |
| ± | 9873 | 0.7% |
| 0 | 7814 | 0.6% |
| 3 | 6920 | 0.5% |
| 4 | 6744 | 0.5% |
| 9 | 6016 | 0.4% |
| Other values (65) | 58145 | 4.3% |
Cyrillic
| Value | Count | Frequency (%) |
| а | 39656 | |
| Ќ | 4727 | 5.5% |
| Г | 3407 | 4.0% |
| Ш | 3227 | 3.8% |
| Ё | 3048 | 3.5% |
| ѕ | 2916 | 3.4% |
| ї | 2653 | 3.1% |
| Щ | 2233 | 2.6% |
| Ѓ | 2200 | 2.6% |
| І | 1707 | 2.0% |
| Other values (60) | 20207 |
Latin
| Value | Count | Frequency (%) |
| a | 2210093 | |
| i | 766624 | 6.6% |
| r | 731407 | 6.3% |
| n | 683385 | 5.9% |
| h | 615590 | 5.3% |
| e | 550117 | 4.8% |
| u | 501327 | 4.3% |
| d | 452484 | 3.9% |
| s | 377404 | 3.3% |
| m | 361429 | 3.1% |
| Other values (42) | 4328159 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 12872594 | |
| Cyrillic | 85981 | 0.7% |
| None | 52426 | 0.4% |
| Punctuation | 9462 | 0.1% |
| Letterlike Symbols | 940 | < 0.1% |
| Currency Symbols | 488 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 2210093 | |
| 1107756 | 8.6% | |
| i | 766624 | 6.0% |
| r | 731407 | 5.7% |
| n | 683385 | 5.3% |
| h | 615590 | 4.8% |
| e | 550117 | 4.3% |
| u | 501327 | 3.9% |
| d | 452484 | 3.5% |
| s | 377404 | 2.9% |
| Other values (83) | 4876407 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 39656 | |
| Ќ | 4727 | 5.5% |
| Г | 3407 | 4.0% |
| Ш | 3227 | 3.8% |
| Ё | 3048 | 3.5% |
| ѕ | 2916 | 3.4% |
| ї | 2653 | 3.1% |
| Щ | 2233 | 2.6% |
| Ѓ | 2200 | 2.6% |
| І | 1707 | 2.0% |
| Other values (60) | 20207 |
None
| Value | Count | Frequency (%) |
| ° | 29772 | |
| ± | 9873 | 18.8% |
| ® | 2143 | 4.1% |
| µ | 2140 | 4.1% |
| ¤ | 1334 | 2.5% |
| § | 1164 | 2.2% |
| ¬ | 1058 | 2.0% |
| ¦ | 1052 | 2.0% |
| ¶ | 772 | 1.5% |
| | 625 | 1.2% |
| Other values (6) | 2493 | 4.8% |
Punctuation
| Value | Count | Frequency (%) |
| ‚ | 1819 | |
| • | 1681 | |
| ‡ | 1109 | |
| † | 1099 | |
| … | 1024 | |
| — | 808 | |
| ‹ | 515 | 5.4% |
| „ | 502 | 5.3% |
| ‘ | 214 | 2.3% |
| – | 139 | 1.5% |
| Other values (5) | 552 | 5.8% |
Letterlike Symbols
| Value | Count | Frequency (%) |
| № | 672 | |
| ™ | 268 | 28.5% |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 488 |
Gender
Text
MISSING 
| Distinct | 1916 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 1003236 |
| Missing (%) | 95.7% |
| Memory size | 8.0 MiB |
Length
| Max length | 84 |
|---|---|
| Median length | 4 |
| Mean length | 4.377644 |
| Min length | 1 |
Characters and Unicode
| Total characters | 198478 |
|---|---|
| Distinct characters | 122 |
| Distinct categories | 13 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 5 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 1702 ? |
|---|---|
| Unique (%) | 3.8% |
Sample
| 1st row | MALE |
|---|---|
| 2nd row | MALE |
| 3rd row | FEMALE |
| 4th row | FEMALE |
| 5th row | MALE |
| Value | Count | Frequency (%) |
| male | 37225 | |
| female | 5302 | 11.5% |
| v | 159 | 0.3% |
| i | 75 | 0.2% |
| 2 | 70 | 0.2% |
| m | 52 | 0.1% |
| k | 45 | 0.1% |
| p | 39 | 0.1% |
| s | 39 | 0.1% |
| reddy | 29 | 0.1% |
| Other values (1878) | 2912 | 6.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| E | 47869 | |
| M | 42722 | |
| A | 42673 | |
| L | 42571 | |
| F | 5319 | 2.7% |
| 2407 | 1.2% | |
| a | 2263 | 1.1% |
| r | 955 | 0.5% |
| i | 762 | 0.4% |
| n | 740 | 0.4% |
| Other values (112) | 10197 | 5.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 183084 | |
| Lowercase Letter | 12132 | 6.1% |
| Space Separator | 2407 | 1.2% |
| Other Punctuation | 378 | 0.2% |
| Decimal Number | 305 | 0.2% |
| Other Symbol | 104 | 0.1% |
| Math Symbol | 31 | < 0.1% |
| Dash Punctuation | 17 | < 0.1% |
| Open Punctuation | 11 | < 0.1% |
| Close Punctuation | 3 | < 0.1% |
| Other values (3) | 6 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 47869 | |
| M | 42722 | |
| A | 42673 | |
| L | 42571 | |
| F | 5319 | 2.9% |
| V | 238 | 0.1% |
| S | 224 | 0.1% |
| R | 194 | 0.1% |
| K | 158 | 0.1% |
| P | 152 | 0.1% |
| Other values (30) | 964 | 0.5% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 2263 | |
| r | 955 | 7.9% |
| i | 762 | 6.3% |
| n | 740 | 6.1% |
| d | 659 | 5.4% |
| e | 638 | 5.3% |
| u | 576 | 4.7% |
| l | 570 | 4.7% |
| t | 567 | 4.7% |
| s | 555 | 4.6% |
| Other values (28) | 3847 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 312 | |
| @ | 8 | 2.1% |
| & | 8 | 2.1% |
| ! | 7 | 1.9% |
| ' | 6 | 1.6% |
| • | 5 | 1.3% |
| ? | 5 | 1.3% |
| ‡ | 4 | 1.1% |
| … | 4 | 1.1% |
| † | 4 | 1.1% |
| Other values (8) | 15 | 4.0% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 98 | |
| 1 | 51 | |
| 3 | 27 | 8.9% |
| 9 | 27 | 8.9% |
| 0 | 24 | 7.9% |
| 4 | 24 | 7.9% |
| 6 | 17 | 5.6% |
| 5 | 14 | 4.6% |
| 8 | 13 | 4.3% |
| 7 | 10 | 3.3% |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 95 | |
| ® | 8 | 7.7% |
| ¦ | 1 | 1.0% |
Math Symbol
| Value | Count | Frequency (%) |
| ± | 30 | |
| ~ | 1 | 3.2% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 14 | |
| — | 3 | 17.6% |
Open Punctuation
| Value | Count | Frequency (%) |
| ‚ | 8 | |
| „ | 3 | 27.3% |
Initial Punctuation
| Value | Count | Frequency (%) |
| ‹ | 2 | |
| ‘ | 1 |
Currency Symbol
| Value | Count | Frequency (%) |
| ¤ | 1 | |
| € | 1 |
Space Separator
| Value | Count | Frequency (%) |
| 2407 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 3 |
Control
| Value | Count | Frequency (%) |
| | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 195000 | |
| Common | 3270 | 1.6% |
| Cyrillic | 208 | 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 47869 | |
| M | 42722 | |
| A | 42673 | |
| L | 42571 | |
| F | 5319 | 2.7% |
| a | 2263 | 1.2% |
| r | 955 | 0.5% |
| i | 762 | 0.4% |
| n | 740 | 0.4% |
| d | 659 | 0.3% |
| Other values (42) | 8467 | 4.3% |
Common
| Value | Count | Frequency (%) |
| 2407 | ||
| . | 312 | 9.5% |
| 2 | 98 | 3.0% |
| ° | 95 | 2.9% |
| 1 | 51 | 1.6% |
| ± | 30 | 0.9% |
| 3 | 27 | 0.8% |
| 9 | 27 | 0.8% |
| 0 | 24 | 0.7% |
| 4 | 24 | 0.7% |
| Other values (35) | 175 | 5.4% |
Cyrillic
| Value | Count | Frequency (%) |
| а | 118 | |
| Ќ | 15 | 7.2% |
| ї | 10 | 4.8% |
| В | 8 | 3.8% |
| џ | 8 | 3.8% |
| Ѓ | 6 | 2.9% |
| ѕ | 5 | 2.4% |
| Ё | 5 | 2.4% |
| Ў | 4 | 1.9% |
| в | 4 | 1.9% |
| Other values (15) | 25 | 12.0% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 198084 | |
| Cyrillic | 208 | 0.1% |
| None | 151 | 0.1% |
| Punctuation | 34 | < 0.1% |
| Currency Symbols | 1 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| E | 47869 | |
| M | 42722 | |
| A | 42673 | |
| L | 42571 | |
| F | 5319 | 2.7% |
| 2407 | 1.2% | |
| a | 2263 | 1.1% |
| r | 955 | 0.5% |
| i | 762 | 0.4% |
| n | 740 | 0.4% |
| Other values (67) | 9803 | 4.9% |
Cyrillic
| Value | Count | Frequency (%) |
| а | 118 | |
| Ќ | 15 | 7.2% |
| ї | 10 | 4.8% |
| В | 8 | 3.8% |
| џ | 8 | 3.8% |
| Ѓ | 6 | 2.9% |
| ѕ | 5 | 2.4% |
| Ё | 5 | 2.4% |
| Ў | 4 | 1.9% |
| в | 4 | 1.9% |
| Other values (15) | 25 | 12.0% |
None
| Value | Count | Frequency (%) |
| ° | 95 | |
| ± | 30 | 19.9% |
| µ | 8 | 5.3% |
| ® | 8 | 5.3% |
| ¶ | 3 | 2.0% |
| · | 3 | 2.0% |
| | 1 | 0.7% |
| § | 1 | 0.7% |
| ¤ | 1 | 0.7% |
| ¦ | 1 | 0.7% |
Punctuation
| Value | Count | Frequency (%) |
| ‚ | 8 | |
| • | 5 | |
| ‡ | 4 | |
| … | 4 | |
| † | 4 | |
| „ | 3 | 8.8% |
| — | 3 | 8.8% |
| ‹ | 2 | 5.9% |
| ‘ | 1 | 2.9% |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 1 |
Address
Text
| Distinct | 7642 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 2717 |
| Missing (%) | 0.3% |
| Memory size | 8.0 MiB |
Length
| Max length | 141 |
|---|---|
| Median length | 124 |
| Mean length | 15.278211 |
| Min length | 1 |
Characters and Unicode
| Total characters | 15978839 |
|---|---|
| Distinct characters | 159 |
| Distinct categories | 16 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 6840 ? |
|---|---|
| Unique (%) | 0.7% |
Sample
| 1st row | Andhra Pradesh |
|---|---|
| 2nd row | Andhra Pradesh |
| 3rd row | Andhra Pradesh |
| 4th row | Andhra Pradesh |
| 5th row | Andhra Pradesh in |
| Value | Count | Frequency (%) |
| pradesh | 939164 | |
| andhra | 939150 | |
| in | 531915 | |
| punjab | 46649 | 1.9% |
| kerala | 23990 | 1.0% |
| hyderabad | 5782 | 0.2% |
| visakhapatnam | 1277 | 0.1% |
| vijayawada | 1041 | < 0.1% |
| nellore | 846 | < 0.1% |
| guntur | 771 | < 0.1% |
| Other values (7089) | 27371 | 1.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 2032476 | |
| 1985450 | ||
| r | 1923828 | |
| d | 1899605 | |
| h | 1885559 | |
| n | 1531944 | |
| P | 987141 | |
| e | 978569 | |
| s | 943360 | |
| A | 942598 | |
| Other values (149) | 868309 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 11997033 | |
| Uppercase Letter | 1987538 | 12.4% |
| Space Separator | 1985455 | 12.4% |
| Decimal Number | 5472 | < 0.1% |
| Other Punctuation | 1312 | < 0.1% |
| Other Symbol | 764 | < 0.1% |
| Dash Punctuation | 576 | < 0.1% |
| Math Symbol | 237 | < 0.1% |
| Open Punctuation | 214 | < 0.1% |
| Close Punctuation | 155 | < 0.1% |
| Other values (6) | 83 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| P | 987141 | |
| A | 942598 | |
| K | 26961 | 1.4% |
| H | 6519 | 0.3% |
| V | 3772 | 0.2% |
| N | 2959 | 0.1% |
| R | 1912 | 0.1% |
| S | 1803 | 0.1% |
| T | 1598 | 0.1% |
| G | 1597 | 0.1% |
| Other values (41) | 10678 | 0.5% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 2032476 | |
| r | 1923828 | |
| d | 1899605 | |
| h | 1885559 | |
| n | 1531944 | |
| e | 978569 | |
| s | 943360 | |
| i | 545628 | 4.5% |
| u | 56739 | 0.5% |
| b | 54724 | 0.5% |
| Other values (34) | 144601 | 1.2% |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 860 | |
| / | 227 | 17.3% |
| : | 36 | 2.7% |
| ; | 23 | 1.8% |
| • | 21 | 1.6% |
| # | 20 | 1.5% |
| † | 18 | 1.4% |
| ' | 17 | 1.3% |
| & | 16 | 1.2% |
| … | 14 | 1.1% |
| Other values (9) | 60 | 4.6% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 1025 | |
| 1 | 975 | |
| 5 | 820 | |
| 2 | 674 | |
| 3 | 567 | |
| 4 | 427 | |
| 6 | 289 | 5.3% |
| 8 | 247 | 4.5% |
| 7 | 246 | 4.5% |
| 9 | 202 | 3.7% |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 714 | |
| ¦ | 21 | 2.7% |
| ® | 12 | 1.6% |
| № | 11 | 1.4% |
| © | 4 | 0.5% |
| ™ | 2 | 0.3% |
Math Symbol
| Value | Count | Frequency (%) |
| ± | 214 | |
| ¬ | 18 | 7.6% |
| + | 2 | 0.8% |
| ~ | 1 | 0.4% |
| < | 1 | 0.4% |
| = | 1 | 0.4% |
Initial Punctuation
| Value | Count | Frequency (%) |
| ‹ | 12 | |
| « | 5 | |
| ‘ | 3 | 13.6% |
| “ | 2 | 9.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 535 | |
| — | 36 | 6.2% |
| – | 5 | 0.9% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 157 | |
| ‚ | 54 | 25.2% |
| „ | 3 | 1.4% |
Currency Symbol
| Value | Count | Frequency (%) |
| ¤ | 16 | |
| € | 4 | 19.0% |
| $ | 1 | 4.8% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 5 | |
| » | 2 | 22.2% |
| › | 2 | 22.2% |
Space Separator
| Value | Count | Frequency (%) |
| 1985450 | ||
| 5 | < 0.1% |
Control
| Value | Count | Frequency (%) |
| | 5 | |
| 2 | 28.6% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 155 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 22 |
Format
| Value | Count | Frequency (%) |
| | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 13983015 | |
| Common | 1994299 | 12.5% |
| Cyrillic | 1525 | < 0.1% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1985450 | ||
| 0 | 1025 | 0.1% |
| 1 | 975 | < 0.1% |
| . | 860 | < 0.1% |
| 5 | 820 | < 0.1% |
| ° | 714 | < 0.1% |
| 2 | 674 | < 0.1% |
| 3 | 567 | < 0.1% |
| - | 535 | < 0.1% |
| 4 | 427 | < 0.1% |
| Other values (55) | 2252 | 0.1% |
Latin
| Value | Count | Frequency (%) |
| a | 2032476 | |
| r | 1923828 | |
| d | 1899605 | |
| h | 1885559 | |
| n | 1531944 | |
| P | 987141 | |
| e | 978569 | |
| s | 943360 | |
| A | 942598 | |
| i | 545628 | 3.9% |
| Other values (42) | 312307 | 2.2% |
Cyrillic
| Value | Count | Frequency (%) |
| а | 852 | |
| Ќ | 94 | 6.2% |
| ѕ | 71 | 4.7% |
| ї | 53 | 3.5% |
| Ѓ | 45 | 3.0% |
| Ё | 41 | 2.7% |
| І | 38 | 2.5% |
| Є | 37 | 2.4% |
| Г | 32 | 2.1% |
| џ | 29 | 1.9% |
| Other values (32) | 233 | 15.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 15976030 | |
| Cyrillic | 1525 | < 0.1% |
| None | 1078 | < 0.1% |
| Punctuation | 189 | < 0.1% |
| Letterlike Symbols | 13 | < 0.1% |
| Currency Symbols | 4 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 2032476 | |
| 1985450 | ||
| r | 1923828 | |
| d | 1899605 | |
| h | 1885559 | |
| n | 1531944 | |
| P | 987141 | |
| e | 978569 | |
| s | 943360 | |
| A | 942598 | |
| Other values (74) | 865500 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 852 | |
| Ќ | 94 | 6.2% |
| ѕ | 71 | 4.7% |
| ї | 53 | 3.5% |
| Ѓ | 45 | 3.0% |
| Ё | 41 | 2.7% |
| І | 38 | 2.5% |
| Є | 37 | 2.4% |
| Г | 32 | 2.1% |
| џ | 29 | 1.9% |
| Other values (32) | 233 | 15.3% |
None
| Value | Count | Frequency (%) |
| ° | 714 | |
| ± | 214 | 19.9% |
| µ | 31 | 2.9% |
| ¦ | 21 | 1.9% |
| ¬ | 18 | 1.7% |
| ¤ | 16 | 1.5% |
| § | 14 | 1.3% |
| ® | 12 | 1.1% |
| ¶ | 11 | 1.0% |
| « | 5 | 0.5% |
| Other values (6) | 22 | 2.0% |
Punctuation
| Value | Count | Frequency (%) |
| ‚ | 54 | |
| — | 36 | |
| • | 21 | 11.1% |
| † | 18 | 9.5% |
| … | 14 | 7.4% |
| ‡ | 13 | 6.9% |
| ‹ | 12 | 6.3% |
| – | 5 | 2.6% |
| ’ | 5 | 2.6% |
| ‘ | 3 | 1.6% |
| Other values (4) | 8 | 4.2% |
Letterlike Symbols
| Value | Count | Frequency (%) |
| № | 11 | |
| ™ | 2 | 15.4% |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 4 |
JobTitle
Text
MISSING 
| Distinct | 6533 |
|---|---|
| Distinct (%) | 25.4% |
| Missing | 1022814 |
| Missing (%) | 97.5% |
| Memory size | 8.0 MiB |
Length
| Max length | 154 |
|---|---|
| Median length | 84 |
| Mean length | 9.7735336 |
| Min length | 1 |
Characters and Unicode
| Total characters | 251776 |
|---|---|
| Distinct characters | 167 |
| Distinct categories | 17 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 5199 ? |
|---|---|
| Unique (%) | 20.2% |
Sample
| 1st row | in |
|---|---|
| 2nd row | 531149 |
| 3rd row | in |
| 4th row | Kapurthala |
| 5th row | Punjab |
| Value | Count | Frequency (%) |
| in | 7769 | |
| pradesh | 4762 | 12.3% |
| andhra | 4756 | 12.3% |
| india | 2638 | 6.8% |
| hyderabad | 1608 | 4.2% |
| vijayawada | 583 | 1.5% |
| anantapur | 293 | 0.8% |
| karimnagar | 243 | 0.6% |
| bangalore | 219 | 0.6% |
| warangal | 206 | 0.5% |
| Other values (5768) | 15513 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 31182 | 12.4% |
| 30579 | 12.1% | |
| n | 20860 | 8.3% |
| d | 18615 | 7.4% |
| r | 17975 | 7.1% |
| i | 16130 | 6.4% |
| h | 11900 | 4.7% |
| e | 11655 | 4.6% |
| s | 7405 | 2.9% |
| A | 6395 | 2.5% |
| Other values (157) | 79080 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 170282 | |
| Space Separator | 30582 | 12.1% |
| Uppercase Letter | 29186 | 11.6% |
| Decimal Number | 19643 | 7.8% |
| Other Punctuation | 872 | 0.3% |
| Other Symbol | 527 | 0.2% |
| Math Symbol | 187 | 0.1% |
| Open Punctuation | 153 | 0.1% |
| Dash Punctuation | 134 | 0.1% |
| Close Punctuation | 98 | < 0.1% |
| Other values (7) | 112 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 6395 | |
| P | 5525 | |
| I | 3140 | |
| H | 1945 | 6.7% |
| S | 1028 | 3.5% |
| N | 976 | 3.3% |
| M | 957 | 3.3% |
| V | 928 | 3.2% |
| T | 908 | 3.1% |
| R | 879 | 3.0% |
| Other values (44) | 6505 |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 31182 | |
| n | 20860 | |
| d | 18615 | |
| r | 17975 | |
| i | 16130 | |
| h | 11900 | 7.0% |
| e | 11655 | 6.8% |
| s | 7405 | 4.3% |
| t | 4312 | 2.5% |
| l | 3607 | 2.1% |
| Other values (37) | 26641 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 586 | |
| / | 47 | 5.4% |
| & | 44 | 5.0% |
| • | 28 | 3.2% |
| @ | 26 | 3.0% |
| ' | 22 | 2.5% |
| † | 21 | 2.4% |
| ‡ | 17 | 1.9% |
| ! | 17 | 1.9% |
| : | 13 | 1.5% |
| Other values (9) | 51 | 5.8% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 4843 | |
| 5 | 4164 | |
| 1 | 2671 | |
| 2 | 2245 | |
| 3 | 1864 | 9.5% |
| 4 | 1209 | 6.2% |
| 6 | 816 | 4.2% |
| 7 | 746 | 3.8% |
| 8 | 623 | 3.2% |
| 9 | 462 | 2.4% |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 470 | |
| № | 26 | 4.9% |
| ® | 18 | 3.4% |
| ¦ | 10 | 1.9% |
| ™ | 3 | 0.6% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 92 | |
| ‚ | 52 | |
| „ | 5 | 3.3% |
| [ | 2 | 1.3% |
| { | 2 | 1.3% |
Math Symbol
| Value | Count | Frequency (%) |
| ± | 171 | |
| ¬ | 14 | 7.5% |
| | | 1 | 0.5% |
| ~ | 1 | 0.5% |
Initial Punctuation
| Value | Count | Frequency (%) |
| ‹ | 9 | |
| « | 5 | |
| “ | 4 | |
| ‘ | 2 | 10.0% |
Final Punctuation
| Value | Count | Frequency (%) |
| ’ | 5 | |
| » | 2 | 20.0% |
| ” | 2 | 20.0% |
| › | 1 | 10.0% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 115 | |
| — | 15 | 11.2% |
| – | 4 | 3.0% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 94 | |
| ] | 2 | 2.0% |
| } | 2 | 2.0% |
Currency Symbol
| Value | Count | Frequency (%) |
| ¤ | 23 | |
| $ | 11 | |
| € | 5 | 12.8% |
Space Separator
| Value | Count | Frequency (%) |
| 30579 | ||
| 3 | < 0.1% |
Control
| Value | Count | Frequency (%) |
| | 28 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 10 |
Format
| Value | Count | Frequency (%) |
| | 4 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ^ | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 198143 | |
| Common | 52325 | 20.8% |
| Cyrillic | 1308 | 0.5% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 30579 | ||
| 0 | 4843 | 9.3% |
| 5 | 4164 | 8.0% |
| 1 | 2671 | 5.1% |
| 2 | 2245 | 4.3% |
| 3 | 1864 | 3.6% |
| 4 | 1209 | 2.3% |
| 6 | 816 | 1.6% |
| 7 | 746 | 1.4% |
| 8 | 623 | 1.2% |
| Other values (57) | 2565 | 4.9% |
Latin
| Value | Count | Frequency (%) |
| a | 31182 | |
| n | 20860 | |
| d | 18615 | 9.4% |
| r | 17975 | 9.1% |
| i | 16130 | 8.1% |
| h | 11900 | 6.0% |
| e | 11655 | 5.9% |
| s | 7405 | 3.7% |
| A | 6395 | 3.2% |
| P | 5525 | 2.8% |
| Other values (42) | 50501 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 589 | |
| Ќ | 81 | 6.2% |
| в | 52 | 4.0% |
| ѕ | 49 | 3.7% |
| Г | 49 | 3.7% |
| џ | 44 | 3.4% |
| ё | 34 | 2.6% |
| Ѓ | 31 | 2.4% |
| ї | 31 | 2.4% |
| р | 29 | 2.2% |
| Other values (38) | 319 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 249478 | |
| Cyrillic | 1308 | 0.5% |
| None | 781 | 0.3% |
| Punctuation | 175 | 0.1% |
| Letterlike Symbols | 29 | < 0.1% |
| Currency Symbols | 5 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 31182 | |
| 30579 | 12.3% | |
| n | 20860 | 8.4% |
| d | 18615 | 7.5% |
| r | 17975 | 7.2% |
| i | 16130 | 6.5% |
| h | 11900 | 4.8% |
| e | 11655 | 4.7% |
| s | 7405 | 3.0% |
| A | 6395 | 2.6% |
| Other values (76) | 76782 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 589 | |
| Ќ | 81 | 6.2% |
| в | 52 | 4.0% |
| ѕ | 49 | 3.7% |
| Г | 49 | 3.7% |
| џ | 44 | 3.4% |
| ё | 34 | 2.6% |
| Ѓ | 31 | 2.4% |
| ї | 31 | 2.4% |
| р | 29 | 2.2% |
| Other values (38) | 319 |
None
| Value | Count | Frequency (%) |
| ° | 470 | |
| ± | 171 | 21.9% |
| | 28 | 3.6% |
| ¤ | 23 | 2.9% |
| ® | 18 | 2.3% |
| µ | 17 | 2.2% |
| ¬ | 14 | 1.8% |
| ¶ | 11 | 1.4% |
| ¦ | 10 | 1.3% |
| « | 5 | 0.6% |
| Other values (5) | 14 | 1.8% |
Punctuation
| Value | Count | Frequency (%) |
| ‚ | 52 | |
| • | 28 | |
| † | 21 | |
| ‡ | 17 | 9.7% |
| — | 15 | 8.6% |
| ‹ | 9 | 5.1% |
| … | 7 | 4.0% |
| ’ | 5 | 2.9% |
| „ | 5 | 2.9% |
| – | 4 | 2.3% |
| Other values (5) | 12 | 6.9% |
Letterlike Symbols
| Value | Count | Frequency (%) |
| № | 26 | |
| ™ | 3 | 10.3% |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 5 |
CompanyName
Text
MISSING 
| Distinct | 7016 |
|---|---|
| Distinct (%) | 38.7% |
| Missing | 1030447 |
| Missing (%) | 98.3% |
| Memory size | 8.0 MiB |
Length
| Max length | 90 |
|---|---|
| Median length | 68 |
| Mean length | 9.80406 |
| Min length | 1 |
Characters and Unicode
| Total characters | 177728 |
|---|---|
| Distinct characters | 158 |
| Distinct categories | 17 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 6020 ? |
|---|---|
| Unique (%) | 33.2% |
Sample
| 1st row | Police |
|---|---|
| 2nd row | Arakuvalley |
| 3rd row | Punjab |
| 4th row | India Jalalabad |
| 5th row | Infoys |
| Value | Count | Frequency (%) |
| india | 3369 | 12.9% |
| in | 2095 | 8.0% |
| andhra | 1759 | 6.7% |
| pradesh | 1743 | 6.7% |
| hyderabad | 469 | 1.8% |
| police | 207 | 0.8% |
| ltd | 151 | 0.6% |
| nellore | 150 | 0.6% |
| 146 | 0.6% | |
| bank | 127 | 0.5% |
| Other values (6369) | 15876 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 20949 | 11.8% |
| 18729 | 10.5% | |
| n | 12732 | 7.2% |
| i | 11825 | 6.7% |
| d | 10965 | 6.2% |
| r | 10247 | 5.8% |
| e | 8304 | 4.7% |
| h | 6020 | 3.4% |
| s | 5232 | 2.9% |
| t | 4786 | 2.7% |
| Other values (148) | 67939 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 120423 | |
| Uppercase Letter | 33149 | 18.7% |
| Space Separator | 18730 | 10.5% |
| Decimal Number | 2924 | 1.6% |
| Other Punctuation | 1161 | 0.7% |
| Other Symbol | 782 | 0.4% |
| Math Symbol | 242 | 0.1% |
| Open Punctuation | 94 | 0.1% |
| Dash Punctuation | 72 | < 0.1% |
| Currency Symbol | 48 | < 0.1% |
| Other values (7) | 103 | 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| I | 4561 | |
| A | 4252 | |
| P | 3360 | 10.1% |
| S | 2342 | 7.1% |
| T | 1607 | 4.8% |
| C | 1520 | 4.6% |
| R | 1463 | 4.4% |
| N | 1405 | 4.2% |
| H | 1266 | 3.8% |
| M | 1229 | 3.7% |
| Other values (42) | 10144 |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 20949 | |
| n | 12732 | |
| i | 11825 | |
| d | 10965 | |
| r | 10247 | |
| e | 8304 | 6.9% |
| h | 6020 | 5.0% |
| s | 5232 | 4.3% |
| t | 4786 | 4.0% |
| o | 4286 | 3.6% |
| Other values (35) | 25077 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 747 | |
| & | 125 | 10.8% |
| ' | 45 | 3.9% |
| ‡ | 42 | 3.6% |
| @ | 31 | 2.7% |
| • | 29 | 2.5% |
| † | 25 | 2.2% |
| * | 24 | 2.1% |
| / | 21 | 1.8% |
| ¶ | 15 | 1.3% |
| Other values (9) | 57 | 4.9% |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 698 | |
| 5 | 581 | |
| 1 | 424 | |
| 2 | 328 | |
| 3 | 259 | 8.9% |
| 4 | 201 | 6.9% |
| 8 | 114 | 3.9% |
| 6 | 114 | 3.9% |
| 7 | 106 | 3.6% |
| 9 | 99 | 3.4% |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 707 | |
| ® | 32 | 4.1% |
| ¦ | 20 | 2.6% |
| № | 12 | 1.5% |
| © | 6 | 0.8% |
| ™ | 5 | 0.6% |
Initial Punctuation
| Value | Count | Frequency (%) |
| ‹ | 15 | |
| ‘ | 7 | |
| « | 4 | 14.8% |
| “ | 1 | 3.7% |
Math Symbol
| Value | Count | Frequency (%) |
| ± | 228 | |
| ¬ | 12 | 5.0% |
| + | 2 | 0.8% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 53 | |
| — | 16 | 22.2% |
| – | 3 | 4.2% |
Open Punctuation
| Value | Count | Frequency (%) |
| ‚ | 47 | |
| ( | 45 | |
| „ | 2 | 2.1% |
Currency Symbol
| Value | Count | Frequency (%) |
| ¤ | 34 | |
| € | 13 | 27.1% |
| $ | 1 | 2.1% |
Final Punctuation
| Value | Count | Frequency (%) |
| ” | 5 | |
| ’ | 4 | |
| » | 2 | 18.2% |
Space Separator
| Value | Count | Frequency (%) |
| 18729 | ||
| 1 | < 0.1% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 46 |
Control
| Value | Count | Frequency (%) |
| | 13 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 3 |
Format
| Value | Count | Frequency (%) |
| | 2 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ^ | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 151688 | |
| Common | 24190 | 13.6% |
| Cyrillic | 1850 | 1.0% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 18729 | ||
| . | 747 | 3.1% |
| ° | 707 | 2.9% |
| 0 | 698 | 2.9% |
| 5 | 581 | 2.4% |
| 1 | 424 | 1.8% |
| 2 | 328 | 1.4% |
| 3 | 259 | 1.1% |
| ± | 228 | 0.9% |
| 4 | 201 | 0.8% |
| Other values (52) | 1288 | 5.3% |
Latin
| Value | Count | Frequency (%) |
| a | 20949 | 13.8% |
| n | 12732 | 8.4% |
| i | 11825 | 7.8% |
| d | 10965 | 7.2% |
| r | 10247 | 6.8% |
| e | 8304 | 5.5% |
| h | 6020 | 4.0% |
| s | 5232 | 3.4% |
| t | 4786 | 3.2% |
| I | 4561 | 3.0% |
| Other values (42) | 56067 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 893 | |
| Ќ | 114 | 6.2% |
| џ | 85 | 4.6% |
| ї | 68 | 3.7% |
| І | 57 | 3.1% |
| ѕ | 55 | 3.0% |
| р | 55 | 3.0% |
| Ѓ | 42 | 2.3% |
| ё | 41 | 2.2% |
| Є | 41 | 2.2% |
| Other values (34) | 399 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 174513 | |
| Cyrillic | 1850 | 1.0% |
| None | 1129 | 0.6% |
| Punctuation | 206 | 0.1% |
| Letterlike Symbols | 17 | < 0.1% |
| Currency Symbols | 13 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 20949 | 12.0% |
| 18729 | 10.7% | |
| n | 12732 | 7.3% |
| i | 11825 | 6.8% |
| d | 10965 | 6.3% |
| r | 10247 | 5.9% |
| e | 8304 | 4.8% |
| h | 6020 | 3.4% |
| s | 5232 | 3.0% |
| t | 4786 | 2.7% |
| Other values (71) | 64724 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 893 | |
| Ќ | 114 | 6.2% |
| џ | 85 | 4.6% |
| ї | 68 | 3.7% |
| І | 57 | 3.1% |
| ѕ | 55 | 3.0% |
| р | 55 | 3.0% |
| Ѓ | 42 | 2.3% |
| ё | 41 | 2.2% |
| Є | 41 | 2.2% |
| Other values (34) | 399 |
None
| Value | Count | Frequency (%) |
| ° | 707 | |
| ± | 228 | 20.2% |
| µ | 34 | 3.0% |
| ¤ | 34 | 3.0% |
| ® | 32 | 2.8% |
| ¦ | 20 | 1.8% |
| ¶ | 15 | 1.3% |
| § | 14 | 1.2% |
| | 13 | 1.2% |
| ¬ | 12 | 1.1% |
| Other values (6) | 20 | 1.8% |
Punctuation
| Value | Count | Frequency (%) |
| ‚ | 47 | |
| ‡ | 42 | |
| • | 29 | |
| † | 25 | |
| — | 16 | 7.8% |
| ‹ | 15 | 7.3% |
| ‘ | 7 | 3.4% |
| ‰ | 5 | 2.4% |
| … | 5 | 2.4% |
| ” | 5 | 2.4% |
| Other values (4) | 10 | 4.9% |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 13 |
Letterlike Symbols
| Value | Count | Frequency (%) |
| № | 12 | |
| ™ | 5 |
Email
Text
MISSING 
| Distinct | 324762 |
|---|---|
| Distinct (%) | 96.7% |
| Missing | 712816 |
| Missing (%) | 68.0% |
| Memory size | 8.0 MiB |
Length
| Max length | 82 |
|---|---|
| Median length | 52 |
| Mean length | 23.287093 |
| Min length | 1 |
Characters and Unicode
| Total characters | 7818851 |
|---|---|
| Distinct characters | 137 |
| Distinct categories | 14 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 322679 ? |
|---|---|
| Unique (%) | 96.1% |
Sample
| 1st row | singampalli1483@gmail.com |
|---|---|
| 2nd row | alluri081996@gmail.com |
| 3rd row | madhurihari77@gmail.com |
| 4th row | karthik.chandaka98@gmail.com |
| 5th row | naveenrokzs@gmail.com |
| Value | Count | Frequency (%) |
| in | 7584 | 2.2% |
| andhra | 743 | 0.2% |
| pradesh | 742 | 0.2% |
| hyderabad | 94 | < 0.1% |
| india | 54 | < 0.1% |
| ltd | 35 | < 0.1% |
| abc@gmail.com | 34 | < 0.1% |
| student | 33 | < 0.1% |
| vijayawada | 32 | < 0.1% |
| of | 30 | < 0.1% |
| Other values (324887) | 328247 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 1086025 | |
| m | 799518 | 10.2% |
| i | 596157 | 7.6% |
| l | 454943 | 5.8% |
| o | 435204 | 5.6% |
| . | 394351 | 5.0% |
| g | 383837 | 4.9% |
| c | 369638 | 4.7% |
| @ | 325375 | 4.2% |
| r | 318931 | 4.1% |
| Other values (127) | 2654872 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 6473038 | |
| Other Punctuation | 719807 | 9.2% |
| Decimal Number | 592890 | 7.6% |
| Uppercase Letter | 18927 | 0.2% |
| Space Separator | 10855 | 0.1% |
| Connector Punctuation | 2982 | < 0.1% |
| Other Symbol | 133 | < 0.1% |
| Dash Punctuation | 103 | < 0.1% |
| Math Symbol | 39 | < 0.1% |
| Open Punctuation | 37 | < 0.1% |
| Other values (4) | 40 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| A | 2564 | |
| S | 1858 | 9.8% |
| P | 1587 | 8.4% |
| M | 1325 | 7.0% |
| R | 1319 | 7.0% |
| G | 1170 | 6.2% |
| N | 838 | 4.4% |
| K | 820 | 4.3% |
| I | 802 | 4.2% |
| C | 785 | 4.1% |
| Other values (37) | 5859 |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 1086025 | |
| m | 799518 | |
| i | 596157 | 9.2% |
| l | 454943 | 7.0% |
| o | 435204 | 6.7% |
| g | 383837 | 5.9% |
| c | 369638 | 5.7% |
| r | 318931 | 4.9% |
| n | 266539 | 4.1% |
| s | 242809 | 3.8% |
| Other values (28) | 1519437 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 394351 | |
| @ | 325375 | |
| & | 22 | < 0.1% |
| : | 10 | < 0.1% |
| ' | 8 | < 0.1% |
| … | 7 | < 0.1% |
| ! | 6 | < 0.1% |
| • | 5 | < 0.1% |
| § | 5 | < 0.1% |
| † | 5 | < 0.1% |
| Other values (6) | 13 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 91656 | |
| 9 | 77564 | |
| 2 | 66211 | |
| 0 | 65808 | |
| 3 | 52077 | |
| 4 | 50932 | |
| 7 | 50467 | |
| 6 | 46665 | |
| 8 | 45790 | |
| 5 | 45720 |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 115 | |
| ® | 6 | 4.5% |
| № | 4 | 3.0% |
| ™ | 3 | 2.3% |
| ¦ | 3 | 2.3% |
| © | 2 | 1.5% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 17 | |
| ‚ | 13 | |
| „ | 6 | 16.2% |
| { | 1 | 2.7% |
Math Symbol
| Value | Count | Frequency (%) |
| ± | 33 | |
| + | 5 | 12.8% |
| ¬ | 1 | 2.6% |
Currency Symbol
| Value | Count | Frequency (%) |
| ¤ | 6 | |
| € | 4 | |
| $ | 1 | 9.1% |
Initial Punctuation
| Value | Count | Frequency (%) |
| « | 3 | |
| ‹ | 2 | |
| ‘ | 1 | 16.7% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 99 | |
| — | 4 | 3.9% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 19 | |
| } | 1 | 5.0% |
Space Separator
| Value | Count | Frequency (%) |
| 10855 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 2982 |
Control
| Value | Count | Frequency (%) |
| | 3 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 6491682 | |
| Common | 1326887 | 17.0% |
| Cyrillic | 282 | < 0.1% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| . | 394351 | |
| @ | 325375 | |
| 1 | 91656 | 6.9% |
| 9 | 77564 | 5.8% |
| 2 | 66211 | 5.0% |
| 0 | 65808 | 5.0% |
| 3 | 52077 | 3.9% |
| 4 | 50932 | 3.8% |
| 7 | 50467 | 3.8% |
| 6 | 46665 | 3.5% |
| Other values (43) | 105781 | 8.0% |
Latin
| Value | Count | Frequency (%) |
| a | 1086025 | |
| m | 799518 | |
| i | 596157 | 9.2% |
| l | 454943 | 7.0% |
| o | 435204 | 6.7% |
| g | 383837 | 5.9% |
| c | 369638 | 5.7% |
| r | 318931 | 4.9% |
| n | 266539 | 4.1% |
| s | 242809 | 3.7% |
| Other values (42) | 1538081 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 137 | |
| Ќ | 18 | 6.4% |
| Ш | 14 | 5.0% |
| Щ | 11 | 3.9% |
| ё | 9 | 3.2% |
| ѕ | 9 | 3.2% |
| ї | 9 | 3.2% |
| Д | 8 | 2.8% |
| в | 7 | 2.5% |
| џ | 7 | 2.5% |
| Other values (22) | 53 | 18.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7818330 | |
| Cyrillic | 282 | < 0.1% |
| None | 182 | < 0.1% |
| Punctuation | 46 | < 0.1% |
| Letterlike Symbols | 7 | < 0.1% |
| Currency Symbols | 4 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 1086025 | |
| m | 799518 | 10.2% |
| i | 596157 | 7.6% |
| l | 454943 | 5.8% |
| o | 435204 | 5.6% |
| . | 394351 | 5.0% |
| g | 383837 | 4.9% |
| c | 369638 | 4.7% |
| @ | 325375 | 4.2% |
| r | 318931 | 4.1% |
| Other values (70) | 2654351 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 137 | |
| Ќ | 18 | 6.4% |
| Ш | 14 | 5.0% |
| Щ | 11 | 3.9% |
| ё | 9 | 3.2% |
| ѕ | 9 | 3.2% |
| ї | 9 | 3.2% |
| Д | 8 | 2.8% |
| в | 7 | 2.5% |
| џ | 7 | 2.5% |
| Other values (22) | 53 | 18.8% |
None
| Value | Count | Frequency (%) |
| ° | 115 | |
| ± | 33 | 18.1% |
| ¤ | 6 | 3.3% |
| ® | 6 | 3.3% |
| § | 5 | 2.7% |
| ¦ | 3 | 1.6% |
| « | 3 | 1.6% |
| | 3 | 1.6% |
| ¶ | 2 | 1.1% |
| · | 2 | 1.1% |
| Other values (3) | 4 | 2.2% |
Punctuation
| Value | Count | Frequency (%) |
| ‚ | 13 | |
| … | 7 | |
| „ | 6 | |
| • | 5 | 10.9% |
| † | 5 | 10.9% |
| — | 4 | 8.7% |
| ‡ | 3 | 6.5% |
| ‹ | 2 | 4.3% |
| ‘ | 1 | 2.2% |
Letterlike Symbols
| Value | Count | Frequency (%) |
| № | 4 | |
| ™ | 3 |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 4 |
Facebook
Text
MISSING 
| Distinct | 4826 |
|---|---|
| Distinct (%) | 23.4% |
| Missing | 1027992 |
| Missing (%) | 98.0% |
| Memory size | 8.0 MiB |
Length
| Max length | 72 |
|---|---|
| Median length | 8 |
| Mean length | 9.6078803 |
| Min length | 1 |
Characters and Unicode
| Total characters | 197759 |
|---|---|
| Distinct characters | 138 |
| Distinct categories | 14 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 3733 ? |
|---|---|
| Unique (%) | 18.1% |
Sample
| 1st row | 1.55E+15 |
|---|---|
| 2nd row | in |
| 3rd row | in |
| 4th row | 6.98E+14 |
| 5th row | 5.16E+14 |
| Value | Count | Frequency (%) |
| 1.00e+14 | 4556 | 20.7% |
| in | 1027 | 4.7% |
| 1.02e+16 | 111 | 0.5% |
| ltd | 51 | 0.2% |
| 1.20e+14 | 43 | 0.2% |
| 1.49e+14 | 42 | 0.2% |
| 1.95e+15 | 41 | 0.2% |
| 1.76e+15 | 40 | 0.2% |
| 1.86e+15 | 39 | 0.2% |
| 1.07e+14 | 39 | 0.2% |
| Other values (5157) | 16015 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 28555 | |
| . | 18060 | 9.1% |
| 4 | 16323 | 8.3% |
| E | 15844 | 8.0% |
| + | 15549 | 7.9% |
| 0 | 12633 | 6.4% |
| a | 7480 | 3.8% |
| 5 | 6490 | 3.3% |
| 2 | 5623 | 2.8% |
| i | 5341 | 2.7% |
| Other values (128) | 65861 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 86545 | |
| Lowercase Letter | 51814 | |
| Uppercase Letter | 20911 | 10.6% |
| Other Punctuation | 19973 | 10.1% |
| Math Symbol | 15612 | 7.9% |
| Space Separator | 2580 | 1.3% |
| Other Symbol | 205 | 0.1% |
| Open Punctuation | 31 | < 0.1% |
| Dash Punctuation | 27 | < 0.1% |
| Connector Punctuation | 23 | < 0.1% |
| Other values (4) | 38 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 15844 | |
| S | 566 | 2.7% |
| A | 483 | 2.3% |
| C | 341 | 1.6% |
| P | 316 | 1.5% |
| T | 316 | 1.5% |
| R | 312 | 1.5% |
| I | 301 | 1.4% |
| L | 260 | 1.2% |
| D | 257 | 1.2% |
| Other values (37) | 1915 | 9.2% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 7480 | |
| i | 5341 | 10.3% |
| m | 4720 | 9.1% |
| n | 3578 | 6.9% |
| o | 3305 | 6.4% |
| l | 3041 | 5.9% |
| r | 2858 | 5.5% |
| c | 2601 | 5.0% |
| e | 2529 | 4.9% |
| g | 2373 | 4.6% |
| Other values (31) | 13988 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 18060 | |
| @ | 1760 | 8.8% |
| / | 63 | 0.3% |
| & | 33 | 0.2% |
| : | 15 | 0.1% |
| ' | 11 | 0.1% |
| ‡ | 7 | < 0.1% |
| · | 6 | < 0.1% |
| • | 5 | < 0.1% |
| ¶ | 3 | < 0.1% |
| Other values (7) | 10 | 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 28555 | |
| 4 | 16323 | |
| 0 | 12633 | |
| 5 | 6490 | 7.5% |
| 2 | 5623 | 6.5% |
| 3 | 3745 | 4.3% |
| 6 | 3588 | 4.1% |
| 7 | 3218 | 3.7% |
| 8 | 3185 | 3.7% |
| 9 | 3185 | 3.7% |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 196 | |
| ¦ | 4 | 2.0% |
| ® | 3 | 1.5% |
| ™ | 1 | 0.5% |
| № | 1 | 0.5% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 15549 | |
| ± | 58 | 0.4% |
| ¬ | 5 | < 0.1% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 19 | |
| — | 6 | 22.2% |
| – | 2 | 7.4% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 18 | |
| ‚ | 11 | |
| „ | 2 | 6.5% |
Space Separator
| Value | Count | Frequency (%) |
| 2579 | ||
| 1 | < 0.1% |
Currency Symbol
| Value | Count | Frequency (%) |
| € | 7 | |
| ¤ | 4 |
Initial Punctuation
| Value | Count | Frequency (%) |
| « | 3 | |
| ‹ | 2 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 23 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 18 |
Control
| Value | Count | Frequency (%) |
| | 4 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 125042 | |
| Latin | 72259 | |
| Cyrillic | 458 | 0.2% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| E | 15844 | |
| a | 7480 | 10.4% |
| i | 5341 | 7.4% |
| m | 4720 | 6.5% |
| n | 3578 | 5.0% |
| o | 3305 | 4.6% |
| l | 3041 | 4.2% |
| r | 2858 | 4.0% |
| c | 2601 | 3.6% |
| e | 2529 | 3.5% |
| Other values (42) | 20962 |
Common
| Value | Count | Frequency (%) |
| 1 | 28555 | |
| . | 18060 | |
| 4 | 16323 | |
| + | 15549 | |
| 0 | 12633 | |
| 5 | 6490 | 5.2% |
| 2 | 5623 | 4.5% |
| 3 | 3745 | 3.0% |
| 6 | 3588 | 2.9% |
| 7 | 3218 | 2.6% |
| Other values (41) | 11258 | 9.0% |
Cyrillic
| Value | Count | Frequency (%) |
| а | 235 | |
| Ќ | 32 | 7.0% |
| ѕ | 21 | 4.6% |
| ї | 17 | 3.7% |
| Ё | 15 | 3.3% |
| Ш | 13 | 2.8% |
| џ | 13 | 2.8% |
| Ѓ | 12 | 2.6% |
| ё | 11 | 2.4% |
| Ї | 10 | 2.2% |
| Other values (25) | 79 | 17.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 196956 | |
| Cyrillic | 458 | 0.2% |
| None | 296 | 0.1% |
| Punctuation | 40 | < 0.1% |
| Currency Symbols | 7 | < 0.1% |
| Letterlike Symbols | 2 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 28555 | |
| . | 18060 | 9.2% |
| 4 | 16323 | 8.3% |
| E | 15844 | 8.0% |
| + | 15549 | 7.9% |
| 0 | 12633 | 6.4% |
| a | 7480 | 3.8% |
| 5 | 6490 | 3.3% |
| 2 | 5623 | 2.9% |
| i | 5341 | 2.7% |
| Other values (67) | 65058 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 235 | |
| Ќ | 32 | 7.0% |
| ѕ | 21 | 4.6% |
| ї | 17 | 3.7% |
| Ё | 15 | 3.3% |
| Ш | 13 | 2.8% |
| џ | 13 | 2.8% |
| Ѓ | 12 | 2.6% |
| ё | 11 | 2.4% |
| Ї | 10 | 2.2% |
| Other values (25) | 79 | 17.2% |
None
| Value | Count | Frequency (%) |
| ° | 196 | |
| ± | 58 | 19.6% |
| µ | 8 | 2.7% |
| · | 6 | 2.0% |
| ¬ | 5 | 1.7% |
| ¦ | 4 | 1.4% |
| | 4 | 1.4% |
| ¤ | 4 | 1.4% |
| « | 3 | 1.0% |
| ® | 3 | 1.0% |
| Other values (3) | 5 | 1.7% |
Punctuation
| Value | Count | Frequency (%) |
| ‚ | 11 | |
| ‡ | 7 | |
| — | 6 | |
| • | 5 | |
| † | 2 | 5.0% |
| ‹ | 2 | 5.0% |
| ‰ | 2 | 5.0% |
| „ | 2 | 5.0% |
| – | 2 | 5.0% |
| … | 1 | 2.5% |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 7 |
Letterlike Symbols
| Value | Count | Frequency (%) |
| ™ | 1 | |
| № | 1 |
Twitter
Text
MISSING 
| Distinct | 8027 |
|---|---|
| Distinct (%) | 74.3% |
| Missing | 1037773 |
| Missing (%) | 99.0% |
| Memory size | 8.0 MiB |
Length
| Max length | 91 |
|---|---|
| Median length | 46 |
| Mean length | 17.995742 |
| Min length | 1 |
Characters and Unicode
| Total characters | 194390 |
|---|---|
| Distinct characters | 122 |
| Distinct categories | 15 ? |
| Distinct scripts | 3 ? |
| Distinct blocks | 6 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 7549 ? |
|---|---|
| Unique (%) | 69.9% |
Sample
| 1st row | 1.00E+14 |
|---|---|
| 2nd row | pujariganguli12@gmail.com |
| 3rd row | Own Jewellery Showroom |
| 4th row | chouhanb.64@gmail.com |
| 5th row | srinunaidupippalla@gmail.com |
| Value | Count | Frequency (%) |
| 1.00e+14 | 1366 | 12.3% |
| in | 97 | 0.9% |
| 1.02e+16 | 30 | 0.3% |
| india | 21 | 0.2% |
| 1.65e+15 | 19 | 0.2% |
| 1.60e+15 | 19 | 0.2% |
| 1.64e+15 | 18 | 0.2% |
| 1.66e+15 | 17 | 0.2% |
| 1.62e+15 | 15 | 0.1% |
| 1.63e+15 | 15 | 0.1% |
| Other values (8201) | 9533 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 22082 | 11.4% |
| m | 16117 | 8.3% |
| i | 12399 | 6.4% |
| . | 11902 | 6.1% |
| o | 9618 | 4.9% |
| l | 9230 | 4.7% |
| 1 | 7925 | 4.1% |
| c | 7764 | 4.0% |
| g | 7712 | 4.0% |
| r | 6997 | 3.6% |
| Other values (112) | 82644 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 136725 | |
| Decimal Number | 30021 | 15.4% |
| Other Punctuation | 18612 | 9.6% |
| Uppercase Letter | 4880 | 2.5% |
| Math Symbol | 3418 | 1.8% |
| Space Separator | 507 | 0.3% |
| Connector Punctuation | 149 | 0.1% |
| Other Symbol | 51 | < 0.1% |
| Open Punctuation | 9 | < 0.1% |
| Close Punctuation | 7 | < 0.1% |
| Other values (5) | 11 | < 0.1% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| E | 3472 | |
| S | 141 | 2.9% |
| A | 135 | 2.8% |
| C | 105 | 2.2% |
| I | 92 | 1.9% |
| M | 79 | 1.6% |
| P | 78 | 1.6% |
| N | 76 | 1.6% |
| O | 69 | 1.4% |
| R | 68 | 1.4% |
| Other values (33) | 565 | 11.6% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 22082 | |
| m | 16117 | |
| i | 12399 | 9.1% |
| o | 9618 | 7.0% |
| l | 9230 | 6.8% |
| c | 7764 | 5.7% |
| g | 7712 | 5.6% |
| r | 6997 | 5.1% |
| n | 5871 | 4.3% |
| s | 5249 | 3.8% |
| Other values (29) | 33686 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 11902 | |
| @ | 6665 | |
| / | 20 | 0.1% |
| & | 10 | 0.1% |
| • | 4 | < 0.1% |
| : | 3 | < 0.1% |
| ? | 2 | < 0.1% |
| · | 2 | < 0.1% |
| † | 2 | < 0.1% |
| § | 1 | < 0.1% |
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 7925 | |
| 0 | 4675 | |
| 4 | 4258 | |
| 5 | 2363 | 7.9% |
| 9 | 2099 | 7.0% |
| 2 | 2031 | 6.8% |
| 3 | 1692 | 5.6% |
| 7 | 1681 | 5.6% |
| 6 | 1654 | 5.5% |
| 8 | 1643 | 5.5% |
Math Symbol
| Value | Count | Frequency (%) |
| + | 3398 | |
| ± | 18 | 0.5% |
| ~ | 1 | < 0.1% |
| = | 1 | < 0.1% |
Other Symbol
| Value | Count | Frequency (%) |
| ° | 45 | |
| ® | 4 | 7.8% |
| № | 2 | 3.9% |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 7 | |
| ‚ | 1 | 11.1% |
| „ | 1 | 11.1% |
Currency Symbol
| Value | Count | Frequency (%) |
| € | 3 | |
| ¤ | 2 |
Space Separator
| Value | Count | Frequency (%) |
| 507 |
Connector Punctuation
| Value | Count | Frequency (%) |
| _ | 149 |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 7 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 3 |
Format
| Value | Count | Frequency (%) |
| | 1 |
Final Punctuation
| Value | Count | Frequency (%) |
| › | 1 |
Modifier Symbol
| Value | Count | Frequency (%) |
| ^ | 1 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 141489 | |
| Common | 52788 | 27.2% |
| Cyrillic | 113 | 0.1% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| a | 22082 | |
| m | 16117 | |
| i | 12399 | 8.8% |
| o | 9618 | 6.8% |
| l | 9230 | 6.5% |
| c | 7764 | 5.5% |
| g | 7712 | 5.5% |
| r | 6997 | 4.9% |
| n | 5871 | 4.1% |
| s | 5249 | 3.7% |
| Other values (42) | 38450 |
Common
| Value | Count | Frequency (%) |
| . | 11902 | |
| 1 | 7925 | |
| @ | 6665 | |
| 0 | 4675 | 8.9% |
| 4 | 4258 | 8.1% |
| + | 3398 | 6.4% |
| 5 | 2363 | 4.5% |
| 9 | 2099 | 4.0% |
| 2 | 2031 | 3.8% |
| 3 | 1692 | 3.2% |
| Other values (31) | 5780 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 57 | |
| Ќ | 10 | 8.8% |
| Ў | 4 | 3.5% |
| ї | 3 | 2.7% |
| ё | 3 | 2.7% |
| Ђ | 3 | 2.7% |
| І | 3 | 2.7% |
| О | 2 | 1.8% |
| П | 2 | 1.8% |
| б | 2 | 1.8% |
| Other values (19) | 24 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 194187 | |
| Cyrillic | 113 | 0.1% |
| None | 76 | < 0.1% |
| Punctuation | 9 | < 0.1% |
| Currency Symbols | 3 | < 0.1% |
| Letterlike Symbols | 2 | < 0.1% |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| a | 22082 | 11.4% |
| m | 16117 | 8.3% |
| i | 12399 | 6.4% |
| . | 11902 | 6.1% |
| o | 9618 | 5.0% |
| l | 9230 | 4.8% |
| 1 | 7925 | 4.1% |
| c | 7764 | 4.0% |
| g | 7712 | 4.0% |
| r | 6997 | 3.6% |
| Other values (68) | 82441 |
Cyrillic
| Value | Count | Frequency (%) |
| а | 57 | |
| Ќ | 10 | 8.8% |
| Ў | 4 | 3.5% |
| ї | 3 | 2.7% |
| ё | 3 | 2.7% |
| Ђ | 3 | 2.7% |
| І | 3 | 2.7% |
| О | 2 | 1.8% |
| П | 2 | 1.8% |
| б | 2 | 1.8% |
| Other values (19) | 24 |
None
| Value | Count | Frequency (%) |
| ° | 45 | |
| ± | 18 | 23.7% |
| ® | 4 | 5.3% |
| µ | 3 | 3.9% |
| · | 2 | 2.6% |
| ¤ | 2 | 2.6% |
| | 1 | 1.3% |
| § | 1 | 1.3% |
Punctuation
| Value | Count | Frequency (%) |
| • | 4 | |
| † | 2 | |
| ‚ | 1 | 11.1% |
| › | 1 | 11.1% |
| „ | 1 | 11.1% |
Currency Symbols
| Value | Count | Frequency (%) |
| € | 3 |
Letterlike Symbols
| Value | Count | Frequency (%) |
| № | 2 |
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
| Number | Carrier | Name | Gender | Address | JobTitle | CompanyName | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 9.170000e+11 | BSNL MOBILE | Jikku Ayush Kids | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 1 | 9.170000e+11 | BSNL MOBILE | Goswami Ritu | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 2 | 9.170000e+11 | BSNL MOBILE | Sruthi | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 3 | 9.170000e+11 | BSNL MOBILE | Arshiya Kadiri | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 4 | 9.170000e+11 | BSNL MOBILE | Singampalli Naresh | NaN | Andhra Pradesh in | NaN | Police | singampalli1483@gmail.com | NaN | NaN |
| 5 | 9.170000e+11 | BSNL MOBILE | Ramakka | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 6 | 9.170000e+11 | BSNL MOBILE | Ram Ram | NaN | Andhra Pradesh in | NaN | NaN | NaN | NaN | NaN |
| 7 | 9.170000e+11 | BSNL MOBILE | G Thulasi Ut | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 8 | 9.170000e+11 | BSNL MOBILE | Shailaja.v | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 9 | 9.170000e+11 | BSNL MOBILE | Ram Alluri | NaN | Andhra Pradesh in | NaN | NaN | alluri081996@gmail.com | NaN | NaN |
| Number | Carrier | Name | Gender | Address | JobTitle | CompanyName | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| 1048565 | 9.190000e+11 | BSNL MOBILE | Chandrakala | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 1048566 | 9.190000e+11 | BSNL MOBILE | Syam | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 1048567 | 9.190000e+11 | BSNL MOBILE | Kaithi Rajashekar | NaN | Andhra Pradesh in | NaN | NaN | NaN | NaN | NaN |
| 1048568 | 9.190000e+11 | BSNL MOBILE | Kareem | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
| 1048569 | 9.190000e+11 | BSNL MOBILE | Abdul Rahman | NaN | Andhra Pradesh in | NaN | NaN | abdul786@gmail.com | NaN | NaN |
| 1048570 | 9.190000e+11 | BSNL MOBILE | Murali Krishna | NaN | Andhra Pradesh in | NaN | NaN | simurali2133@gmail.com | NaN | NaN |
| 1048571 | 9.190000e+11 | BSNL MOBILE | Gopi Bayyapuneni | MALE | Andhra Pradesh in | NaN | NaN | bayyapunenigopi@gmail.com | 2.53E+14 | NaN |
| 1048572 | 9.190000e+11 | BSNL MOBILE | Kamesh Kamesh | NaN | Andhra Pradesh in | NaN | NaN | eppalakameshwar@gmail.com | NaN | NaN |
| 1048573 | 9.190000e+11 | BSNL MOBILE | Puli | sampath | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN |
| 1048574 | 9.190000e+11 | BSNL MOBILE | NaN | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN |
Most frequently occurring
| Number | Carrier | Name | Gender | Address | JobTitle | CompanyName | # duplicates | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 6521 | 9.180000e+11 | BSNL MOBILE | NaN | NaN | Punjab | NaN | NaN | NaN | NaN | NaN | 22506 |
| 30890 | 9.190000e+11 | BSNL MOBILE | NaN | NaN | Andhra Pradesh in | NaN | NaN | NaN | NaN | NaN | 13183 |
| 6517 | 9.180000e+11 | BSNL MOBILE | NaN | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN | 7704 |
| 30888 | 9.190000e+11 | BSNL MOBILE | NaN | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN | 6977 |
| 6519 | 9.180000e+11 | BSNL MOBILE | NaN | NaN | Kerala | NaN | NaN | NaN | NaN | NaN | 3806 |
| 3141 | 9.170000e+11 | BSNL MOBILE | NaN | NaN | Andhra Pradesh in | NaN | NaN | NaN | NaN | NaN | 1457 |
| 3140 | 9.170000e+11 | BSNL MOBILE | NaN | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN | 1423 |
| 22953 | 9.190000e+11 | BSNL MOBILE | Ravi | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN | 553 |
| 13814 | 9.190000e+11 | BSNL MOBILE | Jyothi | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN | 489 |
| 22377 | 9.190000e+11 | BSNL MOBILE | Ramesh | NaN | Andhra Pradesh | NaN | NaN | NaN | NaN | NaN | 455 |